The American Heritage Intermediate Corpus

نویسنده

  • Thomas M. Paikeday
چکیده

Sample studies of the five-million-word American Heritage Intermediate Corpus, the largest yet for English, show the inadequacies of computerized word counts for lexicographical purposes. Out of about 87,000 word types listed in the American Heritage Word Frequency Book (by JOHN B. C~RorI. et alii, Houghton Mifflin-American Heritage, New York, 1971), only about 40 % seem distinctive lexical items. On the other hand, 22 % of the vocabulary entries required by the 50,000-entry dictionary for which the Corpus was prepared did not occur even once among the five million word tokens. We first examined three randomly selected portions of the listing in the American Heritage Word Frequency Book (AHWFB).side by side with the corresponding entry lists of the American Heritage School Dictionary (AHSD), the Thorndike-Barnhart Intermediate Dictionary (TBID), the Merriam-Webster New Students Dictionary (WNSD), the Holt Intermediate Dictionary (HID), and the Random House School Dictionary (RHSD). Each of the three portions checked in AHWFB was based on a consolidated list of 100 entry words (300 in all), made up of the vocabulary entries in the dictionaries: (1) AFTER. to AHAB, (2) CABBAGE to CALEB, and (3) LAKE to LANOLIN. Our aim was to fred out how useful the Corpus was in assessing the vocabulary of Grades 3-9 and in adding new words to the lexicon of that level as presented in the dictionaries. Here are some of our findings: There are approximately 110 word types in each sample of the main AHWFB listings (once-occurring entries listed at bottom of pages were omitted) corresponding to each 100-entry-word portion from the dictionaries. Of the 110 word types, 36 % are lexically undistinctive items inadmissible as vocabulary entries in dictionaries. Such are: (a) hyphenated loose compounds, e.g. After-Shaving, agreedon, and air-breathing; (b) solid forms of hyphenated compounds; (c) spellings with tmdistinctive initial capital instead of lowercase letter, such as After, Against, Age,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Corpus of American Norwegian Speech (CANS)

This paper contains a description of the Corpus of American Norwegian Speech, a new tool for heritage language research. We present the background for its existence, the linguistic contents and its main technical features. The demonstration will show the corpus in use, focussing on problems that are specific to heritage language research, and how the corpus can be searched to provide relevant d...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

Grammatical Gender in American Norwegian Heritage Language: Stability or Attrition?

This paper investigates possible attrition/change in the gender system of Norwegian heritage language spoken in America. Based on data from 50 speakers in the Corpus of American Norwegian Speech (CANS), we show that the three-gender system is to some extent retained, although considerable overgeneralization of the masculine (the most frequent gender) is attested. This affects both feminine and ...

متن کامل

The Impact of Teaching Corpus-based Collocation on EFL Learners' Writing Ability

Abstract The present study explores the impact of corpus-based collocation instruction on intermediate Iranian EFL learners' writing ability. For this study, 84 Iranian learners, studying English as a foreign language in Bayan Institute, Iran, were selected and were randomly divided into two groups, experimental and control. Conventional methods of writing instruction were taught to the control...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1973